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(54) Load adaptive buffer management in packet networks 



(57) The setting of the queue thresholds in active 
queue management schemes such as RED (random 
early detection) is problematic because the required 
buffer size for good sharing among TCP connections is 
dependent on the number of TCP connection using the 
buffer. Techniques for enhancing the effectiveness of 
such buffer management schemes are described. The 
techniques dynamically change the threshold settings 
as the system load, e.g., the number of connections, 



changes. The invention uses variables that correlate 
well with system load. The variables should reflect the 
congestion notification rate since this rate is closely re- 
lated to the TCP congestion window size which in turn 
is closely related to the system load. This can be, for 
instance, a measured loss rate or can also be a com- 
puted value such as the drop probability. Using the tech- 
niques, routers and switches can effectively control 
packets losses and TCP time-outs which maintaining 
high link utilization. 
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Description 
Field of Invention 

[0001] The present invention resides in the field of congestion control in packet networks. In particular it is directed 
to buffer management techniques to be used at a node such as a router, switch, gateway, server, etc., that enable a 
network bottleneck to adapt quickly and dynamically to changes in offered load or available bandwidth so that the 
resources are better utilized and fairly shared by users. 

Background of Invention 

[0002] Congestion control in packet networks has proven to be a difficult problem, in general. The problem is par- 
ticularly challenging .n the Internet where congestion control is mainly provided by end-to-end mechanisms in TCP by 

"^t 8 C °TL! h f qU6UeS ' A ,ar9e ,aCt ° r in TCP ' S wides P read use « it* ability to adapt quickly to changes in 
offered load orava.lablebandwidth. Although the invention is described in connection withTCP, it should be emphasized 
that the concept described herein is equally applicable to other mechanisms in which a similar congestion control 
scnsm© is usgq. 

[0003] The performance of TCP becomes significantly degraded when the number of active TCP flows exceeds the 
network's bandwidth-delay product measured in packets. When the TCP sender's congestion window becomes less 
than 4 packe s, TCP .s no longer able to recover from a single packet loss since the fast-retransmit mechanism needs 
at least 3 duplicate acknowledgments (ACKs) to get triggered. Thus, the congestion windows below 4 are not amenable 
™™! fast - retransmit mechanism of TCP and a single packet loss will send the connection into time-out 
[0004] With inadequate buffering, a large number of connections will tend to keep the buffers full and the resulting 
packet osses w.ll force many of the connections into time-out. As link utilization grows, premature loss may occur long 
before full bottleneck utilization is achieved due to the bursty nature of IP traffic. Fine grain bandwidth sharing where 
sharing ,s achieved over time intervals under 1 or2 seconds, is important for interactive applications, but is not possible 
unless connects avoid time-outs. One way to solve this problem is to provision routers with not just one round-trip 
time of buffering but buffering proportional to the total number of active flows. Many router vendors adopt the "one 
round-tnp rime buffering approach. Although this is a step in the right direction, this only addresses the link utilization 
problem, but not the packet loss problem. It is important to note that the requirements support large aggregations of 

~1°H ,ar 9 e deP ' 0yment0f ,P ' includin 9 existi "9 °< P |a ™* d commercial ,P seLes It has been 

suggested ,n TCP Behavior with Many Flows" by R. Morris, IEEE International Conference Network Protocols Atlanta 
Georg.a, Oct 1997 that buffer space of ten packets per flow is desirable. Large buffers should be possible in routers 
nlr!t T of ™ emor V is dipping rapidly due to demands in the computer industry. However, to ensure stable 
operation, large buffers require more active forms of management than the traditional tail-drop 
[0005] The basic idea behind active queue management schemes such as RED is to detect incipient congestion 
TclotTh! ST™? CO " 9 f ti0 " "°; ifica «o" t° the endsystems, thus allowing them to reduce their transmission rates 

ITartille T d T b ? re qUeUSS the nStWOrk ° Verf ,OW and eXCessive numbefs of P a <*ets are dropped. 

An article ent.tled Random Early Detect.cn Gateways for Congestion Avoidance" by Floyd et al IEEE/ACM Transac- 
tions on Networking, Vol. 1, No. 4, Aug. 1993, pp. 397-413 describes the RED scheme ' ^'^ M ' ranSaC 
[0006] The basic RED scheme (and its newer variants) maintains an average of the queue length It then uses the 

omS 3 t 3 nUmber h ° f qU6Ue threSh ° ,dS l ° det6Ct C ° n9eSti0n " RE ° SChemes dr °P P^ets i" a random 

probabilistic manner where the probability is a function of recent buffer fill history. The objective is to provide a more 

oT iTTr ° f PaCk6t ,0SS ' aVOW the s y nch ™* at i°" °' «°ws. and at tne same Le improve the utinzTtton 
of the network. The setting of the queue thresholds in RED schemes is problematic because the buffer size to good 
sharing is dependent on the number of TCP connections using the buffer. To keep latency at the router low it may be 
desirable to set the thresholds low. But setting it too low will cause many time-outs, wLh drastically degraded 
latency perce.ved by the user. On the other hand, setting the thresho.ds too high unnecessarily increases the latency 
when operating with a small number of connections. This means the setting of the thresholds should not be done in 
an ad hoc manner but should be tied to the number of active connections sharing the same buffer 
[0007] It is important to note that high network utilization is only good when the packet loss rate is low. This is because 

l,lT r0 c Tf "I n !? atiVely im P act overa » netw ° rk a " d end-user performance. A lost packet consumes 
network resources before ,t ,s dropped, thereby impacting the efficiency in other parts of the network. As noted earlier 
high packet loss rates also cause long and unpredictable delays as a result of TCP time-outs. It is therefore desirable 
to achieve high network utilization but with low packet loss rates. This means that even if large buffers are used in the 
m«™ ' ° T* ^ U,i,i2ation ' a PP™Pnate steps must also be taken to ensure that packet losses are low 
0008] It would enhance the effectiveness of RED and other similar schemes if the threshold settings were dynam- 
ically changed as the number of connections changes. The article "Scalable TCP Congestion Control" by R Morris 
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Ph.D. Thesis, Harvard University, available as of Dec. 3, 1999 at the author's website http://www.pdos. Ics.mit.edu/ 
~~ rtm/papers/tp.pdf describes a technique to control packet losses by dynamically changing the queue threshold setting. 
The technique requires counting the number of connections by inspecting packet control headers. This technique, 
however, is impractical in real core networks. A short article on the same subject having the same title is also available 
at the website. 

[0009] It is therefore envisaged that it would be better to have other measures for adjusting the thresholds in order 
to keep the loss rate at or below a specified value that would not cause too many time-outs. This would ensure robust 
operation of the data sources while also keeping the queue length as small as possible. The control strategy of the 
invention does not involve flow accounting complexities as in the afore-mentioned article by R. Morris. 
[0010] In one embodiment, by estimating the actual loss rate and changing the buffer management parameters (e. 
g. thresholds) accordingly, it can be ensured that the loss rate will be controlled around a pre-specified target loss rate 
value. As the number of connections increases, the threshold will be adjusted upward to maintain a loss rate that will 
not cause excessive time-outs. As the number of connections decreases, the threshold will be adjusted downwards 
to keep the latency at the router as small as possible. This is to prevent excessive queue buildup when the number of 
flows is low. In further embodiments, the actual loss rate can be estimated by using a computed value, such as a drop 
probability or a measured value of packet loss over time. 

[0011] Current TCP implementations expect that the router will drop packets as an indication of congestion. There 
have been proposals for indicating congestion by marking the packet rather than dropping it. It is also possible to 
indicate congestion by generating a congestion notification message directly back to the sender, thus avoiding the 
round trip delay. Such implementations can reduce the total buffer required per flow but still benefit from adjusting the 
buffer management to ensure that enough, but not too much, buffer is made available for the number of flows. 

Summary of Invention 

[0012] The invention therefore resides in the field of buffer management schemes of a packet network. In accordance 
with one aspect, the invention is directed to a method of managing a buffer at an outgoing link of a node. The method 
comprises steps of monitoring the status of a queue in relation to a queue threshold and generating congestion noti- 
fications to data sources in response to the status of the queue. The method further includes steps of computing an 
indication concerning a system load of the node from the rate of congestion notifications and adjusting the queue 
threshold in response to the indication to keep the operation of the data sources within a preferred envelope. 
[0013] In accordance with a further aspect, the invention is directed to a mechanism for managing a buffer at an 
outgoing link of a node in a packet network. The mechanism comprises a queue for buffering packets to be transmitted 
from the node onto the outgoing link and a first controller for monitoring the status of the queue with respect to a first 
queue threshold and generating congestion notifications to data sources in response the status of the queue. The 
mechanism further includes a parameter estimation block for generating an indication concerning a system load of the 
node and a second controller for adjusting the first queue threshold in response to the indication to keep the operation 
of the data sources within a preferred envelope. 

Brief Description of Drawings 

[0014] Figure 1 is a schematic block diagram of an implementation according to an embodiment of the invention. 

[0015] Figure 2 is a schematic illustration of a two-level control strategy. 

[0016] Figure 3 is a schematic illustration of limiting the impact of setpoint changes. 

[0017] Figure 4 is a block diagram of a ramp unit. 

[0018] Figure 5 is a graph showing a drop probability as a function of average queue length. 

[0019] Figure 6 is a graph showing a RED probability parameter. 

Detailed Description of Preferred Embodiments of Invention 

[0020] The concept of the invention is to adjust thresholds of a queue at a node in relation to the system load (i.e., 
the number connections or flows). The main concepts will be described in detail in connection with active queue man- 
agement schemes e.g., RED, DRED (Dynamic Random Early Detection), etc., which use random packet drop mech- 
anism. They are, however, general enough to be applicable to other similar queue management schemes and schemes 
which use packet marking or direct backward congestion notification messages. DRED is an improved algorithm for 
active queue management and is capable of stabilizing a router queue at a level independent of the number of active 
connections. An applicant's copending application entitled "Method and Apparatus for Active Queue Management 
Based on Desired Queue Occupancy" (filing particulars not available) has inventors common to the inventors of the 
present application and describes the DRED scheme in detail. 
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[0021] Th.s adaptation of queue thresholds to the system load is important because the required buffer size (and 
consequently the threshold settings) for good sharing among TCP connections is dependent on the number of TCP 
connections using the buffer. The invention uses variables that correlate well with system load. The variables should 
reflect the congestion notification rate since this rate is closely related to the TCP congestion window size which in 
turn .s closely related to the system load. This can be, for instance, a measured .oss rate or can a.so be a commuted 
value such as the drop probability. In schemes such as RED, the computed value of drop probability is a good variable 
pasTwmrperLd ^ ° f ** * measured value must . ° f necessity reflect some 

[0022] The behavior of TCP is reviewed here. When packet loss rate is low, TCP can sustain its sending rate by the 
o«l! K n t Sm T aS I: reCO y ery mechanism - The fast-retransmit/fast-recovery mechanism helps in the case of isolated 

h k\ rf (i e " ' OSSeS in 3 SinglS Wind0W) - As the P acket loss rate increases, retransmissions 

become dnven by time-outs. Packet loss affects the achievable bandwidth of a TCP connection. The achievable band- 

fTows aTr^rTr Wind ° W SiZS W ° f 3 connection divided b V ^ ™nd-trip time. In an ideal scenario where 

™7 ? f T * 9 6X P erience a °y time -° uts and wh ere losses are spread uniformly overtime a simple 

model for the average size of the congestion window Wol a TCP connection in the presence of loss is- 



A/3bp 



where b .s the number of packets that are acknowledged by a received ACK, and is typically 2 since most TCP imple- 

ZT^lT ^ 7 "e^ery-other-packet" policies and p is the probability that a packet is dropped. As pincreases 

Z^ Tl J ? 6qUati0n bS Seen 83 a PP roxi mating the average number of packets a TCP source wil 

have in flight, given the loss rate p. 

n? c t? F °;" f C t °" nections ' the "umber of packets in flight will be proportionately larger and in the presence of con- 
!™,„H t °/i T P Wi " 66 St ° red Con 9 estion buffer - Therefore, if the loss rate is to be maintained 

!£ntl r« ^ OVe ? WidS rSnge ° f connections > th en in order to prevent congestion collapse, it is desirable to 
tht Thl T 9 TV? 10 SySt6m IOad (that iS thS nUmber ° f connections or flows). It can be deduced from 
the above equation that buffering that automatically adapts to the number of flows while at the same time limits time- 

t^ nn^r ?,? * a T maintains hi 9 h utilization, is desirable. Previous work such as Moms has suggested counting 
Ihen encll V 'T* 9 ^ ° Ut thiS " aS SOme e *P ense in implementation and is problematic 

" 18 P / esent - S, K nce the window si2e ° f a «™ * o\oseiy linked to the loss rate, it is possible to estimate 

nat vl^ t \7* V USm9 3CtUal l0SS rate t0 deterTTline the Si2e 0f Window the flows are "sins and dividing 

keen the w *V * ^ ^ H ° WeVer ' this inVention uses the fact that the primary objects is to 

ff the ?° V ! f me mi " imUm ValUS and ' in some embodiments, that value corresponds to a loss rate 

of flows ° r V3lUe tnSn ° DjeCtiVe iS aChiSVed With ° Ut needi "9 ,0 kn ™ the actual number 

iv«im , nl Ure K 1 Sh0W l a b '.° Ck dia9ram ° f the COntr0 ' tech niQue according to an embodiment of the invention. The 

dmo contmTer S T ** ° 9 "° '°° PS - 7henB * ™ er '°° P 1 2 > composed «* the process 14 and the packet 

condZ! ' a " ° h Uter ' 00P 1 8 ' WhiCh adJUStS th6 Pack6t dr ° P contro,,er Parameters based on the operating 

TCP Sic EST ,S h T ? mC,Ude TCP SOUrC6S 20 Wh6re 6aCh ° f th6Se sources is originating system of 
^ ,S <r onside reri to form part of two loops because it responds to RED or DRED congestion notifications 

7e a an ZTJ* CketamVa ' rate - The block diaaram ^ows a parameter estimator 22 that estimates the parameters 
n 9 h L h a ; ^ °\ the ,oad ) of the process based on observatJons of process inputs (e.g., packet drop 

probab, ty 24) and ,f required, outputs (e.g., queue sizes 26). There is also a controller design block 28 that computes 

SS^LTS 1 7? COntr °" er Paramet8rS baS6d ° n thS ° UtpUt ° f the Parameter es « mator - ^e ou^ro'f te 
naramS r ?» 9 , e computed parameters) is then used to perform queue threshold adjustments. The process 

new ™ r C T P , ( ° r eStimated > continuously and the packet drop controller parameters are updated when 
new parameters values (e.g., a new indication of the system load) are obtained 

comZeP^TTZ 10 r' 9Ure 1 ' teCHniqUe dynamical| y cha "9 es ^e queue threshold(s) of the packet drop 
controller 16 as the number of connections in the queue change. The packet drop controller adjusts to the number of 

to k^n ' mS t PeC !' n9 a , ,0ad indiCatl '° n (e 9 - dr ° P P robabilit y- me asured loss rate, etc.) and adjusting the thresholds 
hreZld w H h . 3 "J? 1 " that W0U ' d ° aUSe t0 ° many timS ° UtS AS t h e number of connections decreases the 
threshold will be adjusted downwards to keep the latency at the router as small as possible. This is to prevent excessive 
queue bu.ldup when the number of flows is low. In this embodiment, a two- level control strategy is adopted wherH 
h,gh-level centre ler (operatmg in the outer ,oop 1 8 on a slower time scale) sets the queue threshold(s) and a low-level 

KT • 9 T: inner J°° P 1 2 ° n 3 ,aStSr timS SCa ' e) C ° m P UteS the Packet drop probability. It is assumed 
that there ,s suff.cent buffenng B such that the queue threshold(s) can be varied as needed. It has been suggested 



t 
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that ten times as many buffers as the number of expected flows be provided. Figure 2 illustrates the two-level control 
strategy. 

[0026] The high-level controller can be viewed as a "quasi-static" or "quasistationary" controller. That is, when there 
are any system disturbances or perturbations (due to, for example, a change in the number of connections, etc.), the 
5 system is allowed to settle into a new steady state before the queue threshold(s) is changed. This ensures that the 
threshold(s) is not changed unnecessarily to affect the computation (and stability) of packet drop probability which in 
turn depends on the threshold settings. The queue threshold(s) is, as a result, typically piece-wise constant with chang- 
es occurring at a slower pace. 

[0027J The actual loss rate can be measured by observing the packet arrival process to the queue, or can be esti- 
10 mated by observing some parameters of the packet drop controller, available in some buffer management schemes 
(e.g., DRED, RED). In these buffer management schemes, the computed packet drop probability is a good measure 
of the actual loss rate to be used at 40 (Figure 2) since it approximates asymptotically the loss rate very well. The 
measured or computed packet drop probability can therefore be used as an indicator for varying the queue threshold 
(s) to further control packet losses. 
15 [0028] The queue threshold(s) can be varied dynamically to keep the packet loss rate close to a pre-specified target 
loss rate value e /7jax since TCP time-outs are very dependent on losses. Note that a target loss rate can only be attained 
if the network is properly engineered and there are adequate resources (e.g., buffers, capacity, etc.). Most random 
packet drop schemes have typically two queue thresholds, an upper threshold Tand a lower threshold L In one em- 
bodiment of the invention, the upper threshold Tis selected as the manipulating variable (to achieve the desired control 
20 behavior), while the lower threshold L is tied to the upper threshold through a simple linear relationship, e.g., L=bT, 
where b is a constant for relating LXoTL can also be set to a fixed known value where appropriate as will be explained 
later. The control target 7"can then be varied dynamically to keep the packet loss rate close to the pre-specified value 

[0029] In a further embodiment, the measured or computed packet loss rate p f is filtered (or smoothed) to remove 
25 transient components before being used in the high-level controller. The smoothed signal is obtained using an EWMA 
(exponentially weighted moving average) filter with gain y (more weight given to the history): 

P/<- 0-7)P/+YP/, 0<y<1 

30 

The actual packet loss rate is approximated by p, . If no filtering is required, then p, = pj. The target loss rate is selected, 
for example, to be Q max =5%. 

[0030] Two algorithms for dynamically adjusting the threshold T to achieve the desired control performance are 
described below according to embodiments of the invention. 

35 

Algorithm 1: 
[0031] 

^o Basic Mechanism: 

If \pi - 6 maJf I > e continuously for 8 sec, then 

r«-[r+Ar.sgn[A 

45 

Sample Implementation: 



50 



55 
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begin: get new p, value 

start = cime^now 
while \p,-e mar \>£ 

stop = time_now 

if stop - start >= 5 sec 

break out of while loop 

endif 

get new p t value 
endwhile 
go to begin 

/* time_now is a free running system clock */ 



where 



e is a tolerance value to eliminate unnecessary updates of 7~(e.g., e = 2%) 
S is an elapse time used to check the loss rate mismatch, (e.g., 8 = 1 sec) 

AT is the control step size and is given as AT=B/K, i.e., the buffer size B is divided into /Cbands (e.g., K= 8, 
sgn [.] denotes the sign of [.] 

T is an upper bound on T, since rcannot be allowed to be close to B resulting in drop-tail behavior 
7 m , n is a lower bound on Tin order to maintain high link utilization since T should not be allowed to be close 
to °- T min can be set to one bandwidth-delay product worth of data. 

The above procedure ensures that T is only changed when it is certain that the packet loss rate has deviated from the 
target by an amount equal to e (2%). The system then tries to maintain losses within the range B max ±e 

Algorithm 2: 
[0032] 

Basic Mechanism: 

anH The queue threshold 7is only changed when the loss rate jS, is either above or below the target loss rate 6 
and the loss rate is deviating from the target loss rate. max 

Sample Implementation: 



EP 1 128 610 A2 



5 



every 5 seconds do the following: 
if p t {now)-6 mas >0.0 then 

if p, (hom') — p,(previous) > 0.0 



increase T: T t-lT + AT] 1 ^ 



10 



else do not change T 
else if Pi (now) -0 nU2x < 0.0 then 



if p t (now) - p t (previous) < 0.0 



decrease T: T <-\T - Ar]£~ 



15 



else do not change T 



In the above procedure there is no notion of a loss tolerance e. Once the loss rate goes above/below the target loss 
rate : indicating that there is a drift from the target, the queue threshold 7 is then increased/decreased. Otherwise the 
threshold is "frozen" since further changes can cause the loss rate to deviate further from the target loss rate. The 

20 de vi at io n s p/{ now) - p t (pre vious) in the above procedure can be computed over shorter sampling intervals (e.g., periods 
smaller than 5 seconds) or over longer intervals, depending on the measurement overhead allowed. 
[0033] In most buffer management schemes, the control loops have constant thresholds (or setpoints). But as dis- 
cussed above, the thresholds may change at certain time instances because of desires to change operating conditions 
such as user delays, loss rates, etc. A threshold is, as a result, typically piece-wise constant with changes occurring 

25 less frequently. It is therefore suitable to view the threshold as a step function. Since the threshold is a system distur- 
bance that can be accessed to, it is possible to feed it through a low-pass filter or a ramping module before it enters 
the low-level controller. In this way, the step function can be made smoother. This property can be useful, since most 
control designs having a good rejection of load disturbances give large overshoots after a sudden change in the thresh- 
old. Smoothing of the queue threshold is particularly important when the step change A 7~is large. This way the command 

30 signals from the high-level controller can be limited so that setpoint changes are not generated at a faster rate than 
the system can cope. 

[0034] Figure 3 shows a scheme for limiting the impact of setpoint changes. In the Figure, a low pass filter or ramping 
unit 50 is located between low-level controller 52 and high-level controller 54. As in the earlier figures, two controllers 
take in same inputs and generate similar outputs except that the queue threshold(s) from the high-level controller is 

35 passed through filter 50 to smooth out the signal. 

[0035] Figure 4 is a block diagram of a ramp unit or rate limiter that can replace the low-pass filter. The output of the 
ramp unit will attempt to follow the input signals. Since there is an integral action in the ramp unit, the inputs and the 
outputs will be identical in steady state. Since the output is generated by an integrator with limited input signal, the rate 
of change of the output will be limited to the bounds given by the limiter. Figure 4 can be described by the following 

40 equations: 



dt " 



sat(e) = sat(7~- y) 



45 



in continuous-time domain 
and 



AK«) = y(n) - y(n - 1) = sat(r - y(n - 1)) 



50 



► in discrete-time domain. 



y(n) = y(n - 1) + sat(T - y(n - 1)) 



55 



The amplitude limiter or saturation n sat(e)" is defined as 
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-a, c <—a 



e, \e\ 



sar(e) = i e, \e\ < a 
a, eta 



where the limit a can be defined as a small fraction of the step size AT i.e., a=g*AT, 0 < g < 1 , if a 7" is large enouah to 
require a smoothing of T. Smoothing may not be required for small AT. In Figure 4 y is L smolSeshlTnpu" 
to the low-level controller. The smoothing of Tcan be implemented in discrete time steps simply as follows: 

initialize y 0 
while e=T-y*0 
y*r- y + sat(» 

pass yto the low-level controller 
wait for next y computing time 
endwhile 

The time interval between the ycomputations should be smallerthan the time interval between the threshold changes. 
An embodiment using the DRED algorithm 

[0036] As mentioned earlier, the Dynamic-RED (DRED) algorithm is an active queue management technique which 
uses a s.mp.e control-theoretic approach to stabilize a router queue occupancy at a level independent of the number 

7 e hT n 7716 beneWS ° f 3 Stabi ' iZed qU6Ue ' n 3 n6tW0rk are hi 9 h — ces utili -«on, predic abfe decays 

of corT^ traffic-.oadindependent network performance (in terms of traffic intensity and number 

L^ 3 ,!? Tl^ qUeUe SiZS in tHe r ° Uter iS assumed t0 b e measured over a window of Af units of time (seconds) 

and the packe drop controller provides a new value of the packet drop probability Pd every At units of time Sort 

At VAt sT^ C °T :Tr ] * ?* SySt6m ' L6t d6n0te the aCtual < ueue size discrete time n, whe e n=V 
that fhe mint Ih' Tth OCCU P anc V- The Soal of the controller is therefore to adapt p d so 

rnnll, magn,tude of the error s '9 nal <*n) = q(n) - T(n) is kept as small as possible. " 
Lnd ?L n ^H° Wer qUeUe threSh ° ld P arameter L is introduced in the control process to help maintain high link utilization 

"ro 8 091 DRF e n e H S,Ze T; 1 " tar96t ' eVel - The ParamSter L is ***** set a ,ittle *™* * a " T. e.g.. L = *7°b 
e [0.8, 0.9] DRED does not drop packets when q(n) <L \n order to maintain high resource utilization and also not to 
urther penal, ze sources which are in the process of backing off in response to (previous) packet drops Notethat tnt e 
S :^ a t,me ,a 9 between the time a P«*et is dropped and the time a source responds to the packet l PTh e 
2?t ^cr^ 6 " St '" C ° ntinUeS eV6n if PaCk6t dr °PP in 9 is suspended (when q(n) < L). P " 
[0039] The DRED computations can be summarized as follows assuming a slow varying threshold T(n): 

DRED control parameters: 

Control gain a; Filter gain p; Target buffer occupancy T(n); 
Minimum queue threshold L(n) = bT(n), b e [0.8, 0.9] 



At time n: 



Sample queue size: q(n) 

Compute current error signal: e(n) = q(n) - T(n) 
Compute filtered error signal: e(n) = fl - $)i>(n - 1) + $ e (n) 
Compute current packet drop probability: 
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g(») 
2T(n) 



-i /wwax <) 



The 2 T(n) term in the above computation is simply a normalization parameter so that the chosen control gain 
a can be preserved throughout the control process since Tcan vary. The normalization constant in the DRED 
algorithm is generally the buffer size B. 

Use pjn) as the packet drop probability until time n+1 , when a new p d is to be computed again 

Store e(n) and pjn) to be used at time n+1 . 



[0040] In DRED, a good measure of the actual packet loss rate is the packet drop probability pjn). pjn) converges 
asymptotically to the actual loss rate. Thus, pjn) approximates the actual loss rate very well. Therefore,. p l (n)=p d (n) 
is used in this embodiment as an indicator for varying the control target Tin order to reduce losses. The p, (n)=p d (n) 
values are filtered as described earlier to obtain p f (n) which is then used in the high-level controller. The filtered values 
are computed as follows: 

P,(n) <- (1 - y)p, (n - 1) + yp d (n), 0 < y < 1 . 
An embodiment using the RED algorithm 

[0041] The RED maintains a weighted average of the queue length which it uses to detect congestion. When the 

average queue length exceeds a minimum threshold L, packets are randomly dropped with a given probability. The 

probability that a packet arriving at the RED queue is dropped depends on, among other things, the average queue 

length, the time elapsed since the last packet was dropped, and a maximum packet drop probability parameter max p . 

When the average queue length exceeds a maximum threshold T, all arriving packets are dropped. 

[0042] The basic RED algorithm is as shown in the following pseudo-code: RED control parameters: 

Filter gain W q \ minimum queue threshold L; maximum queue threshold T\ maximum packet drop probability max p . 

for each packet arrival 

calculate the new average queue size a\>g 
if L <avg<T 

calculate probability p a 
with probability p a : 

mark/drop the arriving packet 
else if T<avg 

drop the arriving packet 



where avg is the average queue size, p a is the RED final packet drop probability. L and Tare the threshold parameters 

that control the average queue length and are defined by the network manager. 

[0043] The average queue size, avg, is maintained by the RED gateway using an EWMA filter 



avg <- (1 - w q ) - avg + w q -q 

where w q is the filter gain, and q is the current queue size. The RED final drop probability p a is derived such that 

avg ~ L 

P b *~ max P ' ~yrr 
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3 1 - count • p b 

where max p is a scaling parameter that defines the maximum drop probability, p b is a RED temporary drop probability 
and count is a variable that keeps track of the number of packets that have been forwarded since the last drop The 
use of the vanable count in this equation causes the packet drops to be evenly distributed over the interval [1 1/pJ 
In RED gateways, when L and Tare fixed, the overall loss rate is controlled by the value max ' 

< 1" '^t! mbodiment ' the threshold L is set to a fixed value equal to one bandwidth-delay product 

worth of data. The maximum threshold T is then adjusted dynamically according to the control needs. 

Loss Behavior of the RED Algorithm 

[0045] The plot of the temporary packet drop probability p b used in the RED calculations as a linear function of the 
foTowf qU6Ue SiZS ^ 13 Sh ° Wn in Fi9Ure 5 ' ThiS temp0rary packet dr °P Probability p b can also be expressed as 



p b <r~ max p a j 9 _' L L = max p ■ X. 



That is, p b is a (.near function of the quantity X, 0 < X < 1 , which is defined as the normalized buffer fill 
[0046] It is shown in the afore-mentioned article by Floyd et al that the interdropping time r between two packets 
follows a geometric attribution with parameter p b and mean £[x] = Vp b if each packet is dropped with probability Pb , 
that is Prob[x = m] = (1 - Pb )^ Ptr The interdropping time r is the number of packets that arrive after a dropped packet 
until the next packet is dropped. For example, when max p = 0. 1 , 1 out of every 1 0 packets on average will be dropped 

ZTrS? 7iZ g lT U \ S ?? iS C '°, Se t0 maXimUm threSh ° ld T (i e - X ~* 1 > This 9 enerates an Packet 
loss rate of 1 0%. Although Vp b packets are dropped on average, it is shown that this probability does not achieve the 

goal of dropping packets at fairly regular intervals over short periods. The article by Floyd et al referenced above 

therefore proposes the computation of the RED final packetdrop probability as follows: 



1 -count -p., 1 

b — count 
Pb 

The plot of RED final packet drop probability p a is therefore an exponential function of the count and is shown in Figure 
6 The value of p a increases slowly and then rises dramatically until it reaches p a = 1 at count = (1/p„ -1) Note that 
his quantity changes with each packet arrival and therefore cannot be used as a true reflection of the loss rate as in 
the case of the temporary packet-marking probability p b orp d in DRED 

[0047] It is shown again in the afore-mentioned article by Floyd et al that by using the final packet drop probability 
Pa, the intermarking time t between two packets follows a uniform distribution with a mean of approximately EM = 1/ 
(2 Pb ) (as opposed to Vp b in the case of using the temporary packet drop probability p b ). Therefore, when max L 0 1 

TZtZrl^ 1 eVerV 5 PaCk6tS Wi " bS dr ° PPed Whe " the avera 9 e ^ ueue size is close t° the maximum 

threshold T i.e.. X-» 1 ). This generates an average packet loss rate of 20%. Note the factor of 2 2 in the ratio between 
the mean intermarking times of the two marking methods for the same average queue size 

S3 ?nL AS , an eXamp ^ wh ™™* P = 0.1 , it can be seen that a packet loss rate of 1 0% is achieved when the ^quantity 

maximum T '^IT M, v nT aV6rage qU6Ue Si2e * ""^ betWeef1 the """Imum Jo 

Zn^OV L fh h = 'J 05 ' ^ ° f 1 ° % " aCWeVed 9iVen that - a -p = 0. 1 . .f X < 0.5, a loss rate smaller 

h?" Zi , t ' W ! W 1 3 ' OSS rate hi9h6r than 1 ° % iS 8Chieved Now assume that the ™>™™ 

hreshold « set to zero, w.thout loss of generality. The probability p b , which will give a packet loss rate of 6 = 10% in 
this example (for a max p = 0. 1 ), is given by: * 

p b = max p ^= max p . X= max p . = 1 (1) 
This results in the average intermarking time between two packets (when packets are marked with probability p a ) being 
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E[i) = — !— = — - — packets. 
2p b max p 

5 This generates an average packet loss rate of 

10 From this example, this corresponds to marking 1 out of every 10 packets, achieving a loss rate of 10%, as expected. 

Therefore, as long as the relationship X = 0.5 in equation (1) above holds, a 10% packet loss rate will be enforced. 

However, it is impossible for this relationship to be maintained over time, since the average queue size changes over 

time (as a function of the system load), and the maximum threshold T is a static parameter in the basic RED technique. 

As a solution, one possibility is to scale the maximum threshold Tin relation to the average queue size avg. That is, 
15 to maintain a max p = 0.1 in the above example (corresponding to a loss rate <)> = 10%), Tcan be varied accordingly in 

avg/ Tto obtain the ratio of 

[0049] Following are some observations regarding the control of packet loss using the RED algorithm: 

(1) Assuming that the target loss rate 6 max is5% and max p is set to 5%. Then on average I out of every 20 packets 
20 must be marked/dropped. For this to be achieved we require the average intermarking time between two packets 

to be: 



E[i] - — — = 20 packets. 
25 ZR t> 

This gives probability p b equal to 2.5%. Thus, to achieve the target loss rate, the probability p b needs to be mainr 
tained constant overtime. To do so, the ratio between the average queue size avgand maximum threshold Tmust 
be equal to 1 >£, as indicated by (1 ) for a max p set to 5%. If max p is set to a higher value (for the same target loss 
30 rate), then the ratio between the average queue size and maximum threshold must be smaller, whereas if max p 

is set to a lower value, the ratio must be larger. This means that regardless of the setting of max p , the ratio avgIT 
can be varied to obtain a constant That is, to achieve a target loss rate for any setting of max pt we want to 
maintain 



35 



p b ~ max p = constant 



such that the target loss rate B max = 1/ £[t] = 2p b is obtained. This suggests that max p is not a critical factor but 
rather maintaining a constant P^that results in the target loss rate of 6 max = 1/ £[r] = 2p b is more important. 
40 (2) The RED final packet drop probability p a is not a good representation of the loss rate as explained above, since 

it changes each packet arrival. However, filtering it (i.e., filtering ail those exponential curves) gives us an estimate 
of the loss rate. The RED final packet drop probability p a is therefore filtered by using an EWMA filter with a gain 
of y= 0.00005. That is, 

45 A A 

P/= (1 -7)Py + YP a - 



[0050] This filtered quantity (i.e. , output of the parameter estimator) is input to the controller design block of the high- 
level controller. This value is used to adapt the maximum threshold Taccording to the algorithms described earlier. It 

so can be seen that when the traffic load increases/decreases, the average queue size increases/decreases, resulting in 
changes of the quantity X (which is the ratio between the average queue size avg and maximum threshold T). Conse- 
quently, the probabilities p b and p a change, causing a deviation from the target loss rate. The algorithms described 
earlier are used to detect such deviations so that the threshold Tcan be changed accordingly. Basically, as illustrated 
in the previous example, if X increases beyond % due to a load increase, the filtered quantity will increase beyond the 

55 target loss rate of 5%. The maximum threshold Tis thus increased making X to converge back to 34, resulting in the 
loss rate converging back to 5%. The same applies when there is a decrease in the system load. 
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Claims 



1 . In a packet network, a method of managing a buffer at an outgoing link of a node comprising steps of: 
monitoring the status of a queue in relation to a queue threshold; 

generating congestion notifications to data sources in response to the status of the queue- 
computing an indication concerning a system load of the node from the rate of congestion notifications- 
adjusting the queue threshold in response to the indication to keep the operation of the data sources within a 
preferred envelope. 

2 " Iteps m o e f th ° d ° f mana9in9 3 bUff6r 31 an ° Utg0ing link 0f a node acc °rding to claim 1 , wherein further comprising 
determine a deviation of the indication from a target value; and 

adjusting the queue threshold in if the deviation is larger than a predetermined value for at least a predeter- 
mined duration of time. 

3. The method of managing a buffer at an outgoing link of a node according to claim 2. comprising a further step of ■ 

mon.tor.ng the congestion notifications over time to derive a congestion notification rate parameter. 

4. The method of managing a buffer at an outgoing link of a node according to claim 2. comprising a further step of 

mon.tor.ng the congestion notifications at a predetermined sampling interval to derive a congestion notifica- 
tion rate parameter. a 

5. The method of managing a buffer at an outgoing link of a node according to claim 2, comprising further steps of: 

performing a random early detection buffer management process; 
observing variables of the process; and 

computing from the variables the indication concerning the system load of the node. 

6. The method of managing a buffer at an outgoing link of a node according to claim 3, further comprising steps of: 

monitoring a current queue size; 

computing an error signal in response to the current queue size and the queue threshold- and 

computing a current congestion notification probability as the congestion notification rate parameter using the 

error signal. a 

7. The method of managing a buffer at an outgoing link of a node according to claim 6, further comprising a step of 

mtenng the current congestion notification probability to generate a smoothed congestion notification prob- 
ability by using an exponentially weighted moving average filter with a predetermined gain. 

8. The method of managing a buffer at an outgoing link of a node according to claim 7. wherein there are two queue 
a of bei " 9 l ° ° ther ""^ 8 predetermined linear relationship, the method further comprising 

adjusting one of the two queue thresholds In proportion to the deviation. 

9. The method of managing a buffer at an outgoing link of a node according to claim 8, wherein the step of adjusting 
the queue threshold is performed through the use of smoothing block. 

10. The method of managing a buffer at an outgoing link of a node according to claim 4. further comprising steps of: 

monitoring a current queue size; 

computing an error signal in response to the current queue size and the queue threshold- and 

computing a current congestion notification probability as the congestion notification rate parameter using the 

error signal. a 

11. The method of managing a buffer at an outgoing link of a node according to claim 10. further comprising a step of 

filtering the current packet congestion notification probability to generate a smoothed congestion notification 
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probability by using an exponentially weighted moving average filter with a predetermined gain. 

12. The method of managing a buffer at an outgoing link of a node according to claim 11 , wherein there are two queue 
thresholds and one being set to the other with a predetermined linear relationship, the method further comprising 
a step of: 

adjusting one of the two queue thresholds in proportion to the deviation. 

13. The method of managing a buffer at an outgoing lin k of a node according to claim 1 2, wherein the step of adjusting 
the queue threshold is performed through the use of smoothing block. 

14. The method of managing a buffer at an outgoing link of a node according to claim 5, further comprising steps of: 

monitoring a current queue size; 

computing an error signal in response to the current queue size and the queue threshold; and 

computing a current congestion notification probability as the congestion notification rate parameter using the 

error signal; and 

using the congestion notification rate parameter as the variable from which the indication concerning the sys- 
tem load of the node is computed. 

1 5. The method of managing a buffer at an outgoing link of a node according to claim 1 4, further comprising a step of: 

filtering the current congestion notification probability to generate a smoothed congestion notification prob- 
ability by using an exponentially weighted moving average filter with a predetermined gain. 

16. The method of managing a buffer at an outgoing link of a node according to claim 15, wherein there are two queue 
thresholds and one being set to the other with a predetermined linear relationship, the method further comprising 
a step of: 

adjusting one of the two queue thresholds in proportion to the deviation. 

17. The method of managing a buffer at an outgoing link of a node according to claim 1 6, wherein the step of adjusting, 
the queue threshold is performed through the use of smoothing block. 

18. The method of managing a buffer at an outgoing link of a node according to claim 3, further comprising steps of: 

monitoring an average queue size over time; 
counting the number of forwarded packets; and 

computing a current congestion notification probability as congestion notification rate parameter using the 
count and the average queue size in relation to the queue threshold. 

1 9. The method of managing a buffer at an outgoing link of a node according to claim 1 8, further comprising a step of: 

filtering the current packet congestion notification probability to generate a smoothed packet congestion 
notification probability by using an exponentially weighted moving average filter with a predetermined gain. 

20. The method of managing a buffer at an outgoing link of a node according to claim 1 9, wherein there are two queue 
thresholds and one being set to the other with a predetermined linear relationship, the method further comprising 
a step of: 

adjusting one of the two queue thresholds in proportion to the deviation. 

21 . The method of managing a buffer at an outgoing link of a node according to claim 20, wherein the step of adjusting 
the queue threshold is performed through the use of smoothing block. 

22. The method of managing a buffer at an outgoing link of a node according to claim 4, further comprising steps of: 

monitoring an average queue size over time; 
counting the number of forwarded packets; and 

computing a current congestion notification probability as congestion notification rate parameter using the 
count and the average queue size in relation to the queue threshold. 

23. The method of managing a buffer at an outgoing link of a node according to claim 22, further comprising a step of: 
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filtering the current congestion notification probability to generate a smoothed congestion notification prob- 
ability by using an exponentially weighted moving average filter with a predetermined gain. 

24. The method of managing a buffer at an outgoing link of a node according to claim 23, wherein there are two queue 
a stej of t0 ° ther ^ 8 predetermined linear relationship, the method further comprising 

adjusting one of the two queue thresholds in proportion to the deviation. 

25. The method of managing a buffer at an outgoing link of a node according to claim 24, wherein the step of adjusting 
the queue threshold is performed through the use of smoothing block. 

26. The method of managing a buffer at an outgoing link of a node according to claim 5. further comprising steps of: 

monitoring an average queue size overtime; 
counting the number of forwarded packets; and 

computing a current congestion notification probability as congestion notification rate parameter using the 
count and the average queue size in relation to the queue threshold- and 

using the congestion notification rate parameter as the variable from which the indication concerning the sys- 
tern load of the node is computed. 

27. The method of managing a buffer at an outgoing link of a node according to claim 26, further comprising a step of 

f.ltenng the current packet congestion notification probability to generate a smoothed packet congestion 
notification probability by using an exponentially weighted moving average filter with a predetermined gain. 

28. The method of managing a buffer at an outgoing link of a node according to claim 27, wherein there are two queue 
a stejof 10 ° ,her ^ 3 P red6termined linear relationship, the method further comprising 

adjusting one of the two queue thresholds in proportion to the deviation. 

29. The method of managing a buffer at an outgoing link of a node according to claim 28, wherein the step of adjusting 
the queue threshold is performed through the use of smoothing block. 

30. A mechanism for managing a buffer at an outgoing link of a node in a packet network comprising: 

a queue for buffering packets to be transmitted from the node onto the outgoing link- 

a first controller for monitoring the status of the queue with respect to a first queue threshold and generating 

congestion notifications to data sources in response the status of the queue- 

a parameter estimation block for generating an indication concerning a system load of the node- and 

a second controller for adjusting the first queue threshold in response to the indication to keep the operation 

of the data sources within a preferred envelope. 

31 " IhtrTrt^TT ma " a9ing a buffer at an out 9° in 9 «"k of a node in a packet network according to claim 30, 
wherein the first controHer further comprises a congestion notification module for generating congestion notifica- 
tions in accordance with the queue status. a 

32 ZT P ~ e n TT m , m T a9in9 3 bUffer 31 a " ° Utg0in9 " nk ° f a node in a P acket network wording to claim 31 , 
wherein the first controller operates at a faster speed than the second controller. 

33 " Ih!rT e !!! aniSm managin 9 a buffer at a " ou t9°i"9 of a node in a packet network according to claim 30 
wherein the parameterest.mation block further comprises congestion notification module for computing congestion 
notrf. cation probability, and an exponentially weighted moving average filter with a predetermined gain for removing 
transient component of the congestion notification probability 

34. The mechanism for managing a buffer at an outgoing link of a node in a packet network according to claim 30 
TrZZT S COntr °" er funherCOm P« ses a soothing block for smoothing the adjustment of the first queue 

35. The mechanism for managing a buffer at an outgoing link of a node in a packet network according to claim 32, 
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wherein the smoothing block is either a low pass filter or a ramp unit. 

36. The mechanism for managing a buffer at an outgoing link of a node in a packet network according to claim 30 , 
further comprising a second queue threshold which has a predetermined linear relationship with the first queue 
threshold. 
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(54) Load adaptive buffer management in packet networks 



(57) The setting of the queue thresholds in active 
queue management schemes such as RED (random 
early detection) is problematic because the required 
buffer size for good sharing among TCP connections is 
dependent on the number of TCP connection using the 
buffer. Techniques for enhancing the effectiveness of 
such buffer management schemes are described. The 
techniques dynamically change the threshold settings 
as the system load, e.g., the number of connections, 



changes. The invention uses variables that correlate 
well with system load. The variables should reflect the 
congestion notification rate since this rate is closely re- 
lated to the TCP congestion window size which in turn 
is closely related to the system load. This can be, for 
instance, a measured loss rate or can also be a com- 
puted value such as the drop probability. Using the tech- 
niques, routers and switches can effectively control 
packets losses and TCP time-outs which maintaining 
high link utilization. 
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